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We claim: 

1. A method of classifying a human colorectal tumor comprising the steps of: 
obtaining an unknown sample derived from a human colorectal tumor; 
determining the gene expression level of each of at least five informative genes in the 
unknown sample, wherein informative genes are selected from the group consisting of: 

a first group of genes that are expressed at higher levels in Dukes' C tumors than 
in Dukes' B tumors, wherein the first group consists of H2BFH, H2BFJ, LENEP, 
SPONl, PC, PADI2, SPAMl, PHKAl, NKX2H, RAB7, SPN, BAIAP3, DVL2, 
GRB14, SRCAP, GTF3C1, RUNXl, SYNGR3, PPL, NPIP, BPAGl, ARF6, PNMAl, 
UPLCl, BAD, FLJ22596, FU10251, KIAA1827, CLIPR-59, KIAA0903, KIAA0992, 
FLB4237, clone HRC00953, cDNA DKFZp586C1923, cDNA DKFZp434K1126, 
DNCHl, DRPLA, ADPRTL2, HSPCA, YWHAZ, AP2A2, GRWD, BICDl, MARK3, 
FLJ10482, FLJl 1588, EL0VL5, EUROMAGE 29222, and DKFZP564A2416, and 
a second group of genes that are expressed at lower levels in Dukes' C tumors 
than in Dukes' B tumors, wherein the second group consists of FUT4 , NDUFB7, 
ASMT, CASP5, PSMFl, PTGER3, FLB0708, RPS21, KIAA0734, FZD3, EIFIA, 
NEBL, SRI, KCNJ6, ANKRD5, SSX2, AF098968, DJ971N18.2, HSU79252, SP329, 
FLJ20420, DTIPIB, cDNA DKFZp434E0528, PTPRA, BS69, CREG, PIGL, ILVBL, 
PANK2, ATP5J, CGI-01, and UBE2N; 
comparing the gene expression level of each of the at least five informative genes in the 
unknown sample to the average gene expression level of that gene in a plurality of reference 
samples that are Dukes' B stage to determine if each of the at least five informative genes is 
expressed at higher or lower levels in the unknown sample relative to the plurality of reference 
samples; and, 

classifying the unknown sample as Dukes' C stage if each of the informative genes 
selected from the first group is expressed at higher levels in the unknown sample than the 
average expression of that gene in the plurality of reference samples and if each of the 
informative genes selected from the second group is expressed at lower levels in the unknown 
sample than the average expression of that gene in the plurality of reference samples. 
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5 2. The method of claim 1 wherein at least twenty infonnative genes are analyzed at each step. 

3. The method of claim 2 wherein the at least twenty informative genes are NKX2H, H2BFJ, 
PADI2, SPAMl, PHKAl, KIAA1827, PSMFl, ASMT, ARF6, PTGER3, BPAGl, SRI, 
GTF3C1, KIAA0903, FLJ22596, DTIPIB, DKFZp586C1923, UPLCl, ANKRD5, and 

10 KIAA0992. 

4. The method of claim 1 wherein the gene expression level of each of the at least five 
informative genes is determined by hybridization to an array of nucleic acid probes. 

15 5. The method of claim 4 wherein the array of nucleic acid probes comprises SEQ ID NOs 
102-321. 

6. A method of identifying compounds that inhibit or promote metastasis of a colorectal 
tumor to the regional lymph nodes comprising: 
20 obtaining a first sample of cells fi-om a colorectal tumor before treatment; 

treating the tumor with the compound; 

obtaining a second sample of cells from the tumor after treatment; 

determining the expression level of at least five informative genes in the first and 
second samples, wherein the at least five informative genes are selected fi-om the group 

25 consisting of H2BFH, H2BFJ, LENEP, SPONl, FUT4, PC, NDUFB7, PADI2, SPAMl, 

PHKAl, NKX2H, RAB7, ASMT, CASP5, PSMFl, SPN, PTGER3, FLB0708, RPS21, BAIAP3, 
KIAA0734, DVL2, GRB14, FZD3, SRCAP, GTF3C1, RUNXl, EIFIA, NEBL, SYNGR3, PPL, 
NPIP, BPAGl, SRI, KCNJ6, ARF6, PNMAl, UPLCl, BAB, SSX2, FLJ22596, FLJ10251, 
AF098968, KIAA1827, CLIPR-59, ANKRD5, KIAA0903, KIAA0992, FLB4237, clone 

30 HRC00953, cDNA DKFZp434E0528, cDNA DKFZp586C1923, cDNA DKFZp434Kl 126, 
PIGL, ILVBL, DNCHl, DRPLA, UBE2N, ADPRTL2, CGI-01, HSPCA, PANK2, ATP5J, 
YWHAZ, AP2A2, GRWD, BICDl, PTPRA, MARK3, BS69, CREG, FU10482, FUl 1588, 
DJ971N18.2, HSU79252, SP329, FLJ20420, DTIPIB, EL0VL5, EUROIMAGE 29222, and 
DKFZP564A2416; 
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5 determining the direction of change between the expression level of each of the at least 

five informative gene in the first and second samples; 

identifying the compound as a promoter of metastasis if the direction of change for each 
of the at least five informative genes is: 

up and the gene is selected from the group consisting of H2BFH, H2BFJ, LENEP, 
10 SPONl, PC, PADI2, SPAMl, PHKAl, NKX2H, RAB7, SPN, BAIAP3, DVL2, GRB14, 
SRCAP, GTF3C1, RUNXl, SYNGR3. PPL, NPIP, BPAGl, ARF6, PNMAl, UPLCl, BAI3, 
FU22596, FU10251, KIAA1827, CLIPR-59, KIAA0903, KIAA0992, FLB4237, clone 
HRC00953, cDNA DKFZp586C1923, cDNA DKFZp434Kl 126, DNCHl, DRPLA, ADPRTL2, 
HSPCA, YWHAZ, AP2A2, GRWD, BICDl, MARK3, FU10482, FUl 1588, EL0VL5, 
1 5 EUROIMAGE 29222, and DKFZP564A24 1 6; or 

down and the gene is selected from the group consisting of FUT4 , NDUFB7, ASMT, 
CASP5, PSMFl, PTGER3, FLB0708, RPS21, KIAA0734, FZD3, EIFIA, NEBL, SRI, KCNJ6, 
ANKRD5, SSX2, AF098968, DJ971N18.2, HSU79252, SP329, FLJ20420, DTIPIB, cDNA 
DKFZp434E0528, PTPRA, BS69, GREG, PIGL, ILVBL, PANK2, ATP5J, CGI-01, and 
20 UBE2N;or, 

identifying the compound as an inhibitor of metastasis if the direction of change for each 
of the at least five informative genes is: 

down and the gene is selected firom the group consisting of H2BFH, H2BFJ, LENEP, 
SPONl, PC, PADn. SPAMl, PHKAl, NKX2H, RAB7, SPN, BAIAP3, DVL2, GRB14, 

25 SRCAP, GTF3C1, RUNXl, SYNGR3, PPL, NPIP, BPAGl, ARF6, PNMAl, UPLCl, BAD, 
FU22596, FUl 0251, KIAA1827, CLIPR-59, KIAA0903, KIAA0992, FLB4237, clone 
HRC00953, cDNADKFZp586C1923, cDNADKFZp434K1126, DNCHl, DRPLA, ADPRTL2, 
HSPCA, YWHAZ, AP2A2, GRWD, BICDl, MARK3, FLJ10482, FUl 1588, EL0VL5, 
EUROIMAGE 29222, and DKFZP564A2416; or 

30 up and the gene is selected from the group consisting of FUT4 , NDUFB7, ASMT, 

CASP5, PSMFl, PTGER3, FLB0708, RPS21, KIAA0734, FZD3, EIFIA, NEBL, SRI, KCNJ6, 
ANKRD5, SSX2, AF098968, DJ971N18.2, HSU79252, SP329, FU20420, DTIPIB, cDNA 
DKFZp434E0528, PTPRA, BS69, CREG, PIGL, ILVBL, PANK2, ATP5J, CGI-01, and 
UBE2N. 
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5 

7. The method of claim 6 wherein at least twenty informative genes are analyzed at each step. 

8. The method of claim 7 wherein the at least twenty informative genes are NKX2H, H2BFJ, 
PADI2, SPAMl, PHKAl, KIAA1827, PSMFl, ASMT, ARF6, PTGER3, BPAGl, SRI, 

10 GTF3C1, KIAA0903, FLJ22596, DTIPIB, DKFZp586C1923, UPLCl, ANKRD5, and 
KIAA0992.. 



9. The method of claim 6 wherein gene expression level is determined using an array of 
nucleic acid probes. 

15 

10. A method of predicting the efficacy of a compound for treating a colorectal tumor 
comprising the steps of: 

obtaining a first sample of cells derived from a colorectal tumor before treatment with the 
compound and a second sample of cells derived from the same tumor after treatment; 

20 determining a gene expression level for at least five informative genes in the first and 

second sample, wherein informative genes are selected from the group consisting of H2BFH, 
H2BFJ, LENEP, SPONl, FUT4, PC, NDUFB7, PADI2, SPAMl, PHKAl, NKX2H, RAB7, 
ASMT, CASP5, PSMFl, SPN, PTGER3, FLB0708, RPS21, BAIAP3, KIAA0734, DVL2, 
GRB14, FZD3, SRCAP, GTF3C1, RUNXl, EIFIA, NEBL, SYNGR3, PPL, NPIP, BPAGl, 

25 SRI, KCNJ6, ARF6, PNMAl, UPLCl, BAI3, SSX2, FU22596, FLJ1025 1, AF098968, 

KIAA1827, CLIPR-59, ANKRD5, KIAA0903, BaAA0992, FLB4237, clone HRC00953, cDNA 
DKFZp434E0528, cDNA DKFZp586C1923, cDNA DKFZp434Kl 126, PIGL, ILVBL, DNCHl, 
DRPLA, UBE2N, ADPRTL2, CGI-01, HSPCA, PANK2, ATP5J, YWHAZ, AP2A2, GRWD, 
BICDl, PTPRA, MARK3, BS69, CREG, FLJ10482, FLJ11588, DJ971N18.2, HSU79252, 

30 SP329, FU20420, DTIPIB, EL0VL5, EUROIMAGE 29222, and DKFZP564A2416; 

comparing the gene expression level of each of the at least five informative gene in the 
first and second samples to determine a direction of change in expression from the first to the 
second sample for each gene, wherein the direction is either up or down; 
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5 comparing the direction of change for each gene to the direction of change for that gene 

from a sample of a first known stage to a sample of a second known stage, wherein the first stage 
is more advanced than the second stage; and 

identifying the compound as effective if the direction of change for each of the at least 
five informative genes is the same. 

10 

11. The method of claim 10 wherein gene expression levels are determined using an array of 
nucleic acid probes. 

12. The method of claim 10 wherein at least twenty informative genes are analyzed at each 
15 step. 

13. The method of claim 12 wherein the at least twenty informative genes are NKX2H, 
H2BFJ, PADI2, SPAMl, PHKAl, KIAA1827, PSMFl, ASMT, ARF6, PTGER3, BPAGl, SRI, 
GTF3C1, KIAA0903, FLJ22596, DTIPIB, DKFZp586C1923, UPLCl, ANKRD5, and 

20 KIAA0992. 

14. A method of screening to identify test substances which induce or repress expression of 
genes which are induced or repressed in a colorectal tumor sample that has metastasized 
compared to a colorectal timior sample that has not metastasized, comprising: 

25 contacting a colorectal tumor cell with a test substance; 

monitoring expression of a transcript or its translation product wherein the transcript is 
from a gene selected from the group consisting of H2BFH, H2BFJ, LENEP, SPONl, FUT4, PC, 
NDUFB7, PADI2, SPAMl, PHKAl, NKX2H, RAB7, ASMT, CASP5, PSMFl, SPN, PTGER3, 
FLB0708, RPS21, BAIAP3, KIAA0734, DVL2, GRB14, FZD3, SRCAP, GTF3C1, RUNXl, 

30 EIFIA, NEBL, SYNGR3, PPL, NPIP, BPAGl, SRI, KCNJ6, ARF6, PNMAl, UPLCl, BAI3, 
SSX2, FLJ22596, FLJ10251, AF098968, iaAA1827, CLIPR-59, ANKRD5, KIAA0903, 
KIAA0992, FLB4237, clone HRC00953, cDNA DKFZp434E0528, cDNA DKFZp586C1923, 
cDNA DKFZp434Kl 126, PIGL, ILVBL, DNCHl, DRPLA, UBE2N, ADPRTL2, CGI-01, 
HSPCA, PANK2, ATP5J, YWHAZ, AP2A2, GRWD, BICDl, PTPRA, MARK3, BS69, CREG, 
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5 FU10482, FUl 1588, DJ971N18.2, HSU79252, SP329, FU20420, DTIPIB, EL0VL5, 
EUROIMAGE 29222, and DKFZP564A2416;. 

15. A method of distinguishing between a Dukes' B and a Dukes' C stage tumor comprising: 
monitoring the expression level of any five or more genes selected from the group consisting of 

10 H2BFH, H2BFJ, LENEP, SPONl, FUT4, PC, NDUFB7, PADI2, SPAMl, PHKAl, NKX2H, 
RAB7, ASMT, CASP5, PSMFl, SPN, PTGER3, FLB0708, RPS21, BAIAP3, KIAA0734, 
DVL2, GRB14, FZD3, SRCAP, GTF3C1, RUNXl, EIFIA, NEBL, SYNGR3, PPL, NPIP, 
BPAGl, SRI. KCNJ6, ARF6, PNMAl, UPLCl, BAI3, SSX2, FU22596, FIJI 0251, AF098968, 
KIAA1827, CLIPR-59, ANKRD5, KIAA0903, KIAA0992, FLB4237, clone HRC00953, cDNA 

15 DKFZp434E0528, cDNA DKFZp586C1923, cDNA DKFZp434Kl 126, PIGL, ILVBL, DNCHl, 
DRPLA, UBE2N, ADPRTL2, CGI-01, HSPCA, PANK2, ATP5J, YWHAZ, AP2A2, GRWD, 
BICDl, PTPRA, MARK3, BS69, CREG, FLJ10482, FUl 1588, DJ971N18.2, HSU79252, 
SP329, FU20420, DTIPIB, EL0VL5, EUROIMAGE 29222, and DKFZP564A2416; and 
comparing the expression levels to a database of expression levels of the five or more genes in 

20 Dukes' B and Dukes' C stage tumors. 

16. The method of claim 15 wherein the expression level of at least twenty genes are 
monitored and compared. 

25 17. The method of claim 1 6 wherein the at least twenty genes are NKX2H, H2BFJ, P ADI2, 
SPAMl, PHKAl, KIAA1827, PSMFl, ASMT, ARF6, PTGER3, BPAGl, SRI, GTF3C1, 
KIAA0903, FU22596, DTIPIB, DKFZp586C1923, UPLCl, ANKRD5, and KIAA0992.. 

18. A method of classifying a colorectal tumor sample as being positive for regional lymph 
30 node metastases comprising: 

isolating a nucleic acid sample fi'om the colorectal tumor sample; 
determining the expression level of at least five informative genes in the sample, wherein 
informative genes are selected fi-om the group consisting of H2BFH, H2BFJ, LENEP, SPONl, 
FUT4, PC, NDUFB7, PADI2, SPAMl, PHKAl, NKX2H, RAB7, ASMT, CASP5, PSMFl, 
35 SPN, PTGER3, FLB0708, RPS21, BAIAP3, iaAA0734, DVL2, GRB14, FZD3, SRCAP, 
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5 GTF3C1, RUNXl, EIFl A, NEBL, SYNGR3, PPL, NPIP, BPAGl, SRI, KCNJ6, ARF6, 
PNMAl, UPLCl, BAI3, SSX2, FLJ22596, FU10251, AF098968, KIAA1827, CLIPR-59, 
ANKRD5, KIAA0903, KIAA0992, FLB4237, clone HRC00953, cDNA DKFZp434E0528, 
cDNA DKFZp586C1923, cDNA DKFZp434Kl 126, PIGL, ILVBL, DNCHl, DRPLA, IJBE2N, 
ADPRTL2, CGI-01, HSPCA, PANK2, ATP5J, YWHAZ, AP2A2, GRWD, BICDl, PTPRA, 

10 MARK3, BS69, CREG, FLJ10482, FUl 1588, DJ971N18.2, HSU79252, SP329, FU20420, 
DTIPIB, EL0VL5, EUROMAGE 29222, and DKFZP564A2416; 

comparing the expression level of the at least five informative genes to the expression 
level of the same informative genes in at least one reference colorectal tumor sample that is 
negative for regional lymph node metastases; 

15 determining if the one or more informative gene in the colorectal tumor sample is 

expressed at a higher level or a lower level relative to the reference; and 

classifying the sample as having regional lymph node metastases if the direction of 
change in the expression of the at least five informative genes in the unknown sample relative to 
the reference is the same as the direction of change for that gene in Tables 2 and 3. 

20 

19. A kit for classifying a colorectal tumor sample as being positive for regional lymph node 
metastases comprising: 

an array of probes wherein each probe is perfectly complementary to a single gene 
selected fi-om the group of genes consisting of H2BFH, H2BFJ, LENEP, SPONl, FUT4, PC, 

25 NDUFB7, PADI2, SPAMl , PHKAl , NKX2H, RAB7, ASMT, CASP5, PSMFl , SPN, PTGER3, 
FLB0708, RPS21, BAIAP3, KIAA0734, DVL2, GRB14, FZD3, SRCAP, GTF3C1, RUNXl, 
EIFIA, NEBL, SYNGR3, PPL, NPIP, BPAGl, SRI, KCNJ6, ARF6, PNMAl, UPLCl, BAD, 
SSX2, FLJ22596, FLJ10251, AF098968, KIAA1827, CLIPR-59, ANKRD5, iaAA0903, 
KIAA0992, FLB4237, clone HRC00953, cDNA DKFZp434E0528, cDNA DKFZp586C1923, 

30 cDNA DKFZp434Kl 126, PIGL, ILVBL, DNCHl, DRPLA, UBE2N, ADPRTL2, CGI-01, 

HSPCA, PANK2, ATP5J, YWHAZ, AP2A2, GRWD, BICDl, PTPRA, MARK3, BS69, CREG, 
FLJ10482, FUl 1588, DJ971N18.2, HSU79252, SP329, FU20420, DTIPIB, EL0VL5, 
EUROIMAGE 29222, and DKFZP564A2416; and wherein the array comprises at least 2 probes 
for each gene in the group of genes; 
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5 a computer-readable medium having computer-executable instructions for performing a 

method comprising: comparing the gene expression level of at least three informative genes in an 
experimental sample to the expression level of the corresponding gene in a plurality of samples 
of known Dukes' stage wherein the Dukes' stage is selected from Dukes' B and Dukes' C and 
determining if the experimental sample is Dukes' B or Dukes' C stage; and 
10 a computer-readable medium having a plurality of gene expression level values for each 

of the genes in the group of genes in a plurality of colorectal tumor samples of known Dukes' 
stage. 
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